4 research outputs found

    DAB Join: A Distributed Adaptive and Balanced N-way Stream Window Join for Shared Nothing Clusters

    Get PDF
    Σε αυτή την δουλειά, θα παρουσιάσουμε έναν προσαρμοστικό αλγόριθμο για παραθυρική ζεύξη σε πολλαπλά ρεύματα δεδομένων χρησιμοποιώντας υπολογιστικό νέφος. Τα κύρια χαρακτηριστικά του αλγορίθμου είναι ότι (1) υποστηρίζει όλων τον ειδών τα κατηγορήματα, (2) μειώνει το κόστος μεταφοράς ενώ την ίδια στιγμή διανέμει ισότιμα τον φόρτο σε όλους τους κόμβους του υπολογιστικού νέφους, (3) χρησιμοποιεί μόνο ένα βήμα για να διανείμει τα δεδομένα και να εκτελέσει τη ζεύξη, αποφεύγοντας την διανομή των ενδιάμεσων δεδομένων τα οποία μπορεί να είναι τεράστια. Υλοποιήσαμε τον αλγόριθμο ζεύξης σε ένα πειραματικό σύστημα με το όνομα ExaStream. Το ExaStream υποστηρίζει την κατανεμημένη εκτέλεση δουλειών οι οποίες μπορούν να εκφραστούν σε ακυκλικούς γράφους από διεργασίες που έχουν οριστεί από τον χρήστη. Τα πειράματα έγιναν σε συνθετικά δεδομένα και δείχνουν ότι ο αλγόριθμός μας κλιμακώνει ενώ ταυτόχρονα προσαρμόζεται σε αλλαγές της ροής των εισερχόμενων ρευμάτων. Λόγω της προσαρμοστικότητας ο αλγόριθμός μας συμπεριφέρεται καλύτερα και δίνει καλύτερους χρόνους εκτέλεσις σε σύγκριση με μη προσαρμοστικές εναλλακτικές.In this work, we present DAB Join, an adaptive operator that enables scalable processing of Multiway Windowed Stream Joins using a shared nothing cluster. DAB join (1) supports any kind of join predicates, (2) minimizes the network cost while at the same time distributes the load equally to all cluster nodes, and (3) uses only one hop to distribute the data and execute the join, avoiding the distribution of intermediate results that may be very large. We have implemented DAB Join on top of an experimental system named Exastream, which supports the distributed execution of jobs expressed as DAGs of user defined operators. Based on synthetic streams, Our experimental results show that our algorithm is scalable. Additionally, DAB Join adapts to changes of stream input rates, which results in better execution times compared to non-adaptive alternatives

    Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)

    Full text link
    Real-time analytics that requires integration and aggregation of heterogeneous and distributed streaming and static data is a typical task in many industrial scenarios such as diagnostics of turbines in Siemens. OBDA approach has a great potential to facilitate such tasks; however, it has a number of limitations in dealing with analytics that restrict its use in important industrial applications. Based on our experience with Siemens, we argue that in order to overcome those limitations OBDA should be extended and become analytics, source, and cost aware. In this work we propose such an extension. In particular, we propose an ontology, mapping, and query language for OBDA, where aggregate and other analytical functions are first class citizens. Moreover, we develop query optimisation techniques that allow to efficiently process analytical tasks over static and streaming data. We implement our approach in a system and evaluate our system with Siemens turbine data

    Ontology-Based Integration of Streaming and Static Relational Data with Optique

    No full text
    Real-time processing of data coming from multiple heterogeneous data streams and static databases is a typical task in many industrial scenarios such as diagnostics of large machines. A complex diagnostic task may require a collection of up to hundreds of queries over such data. Although many of these queries retrieve data of the same kind, such as temperature measurements, they access structurally different data sources. In this work we show how Semantic Technologies implemented in our system OPTIQUE can simplify such complex diagnostics by providing an abstraction layer ontology that integrates heterogeneous data. In a nutshell, OPTIQUE allows complex diagnostic tasks to be expressed with just a few high-level semantic queries. The system can then automatically enrich these queries, translate them into a collection with a large number of low-level data queries, and finally optimise and efficiently execute the collection in a heavily distributed environment. We will demo the benefits of OPTIQUE on a real world scenario from Siemens
    corecore